AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal Document Processing

# Multimodal Document Processing

Smoldocling 256M Preview Mlx Bf16 Docling Snap
This is a 256M-parameter preview version of a document understanding model, specifically designed for document structure parsing and content extraction tasks, supporting the conversion of image documents into structured data.
Image-to-Text Transformers English
S
ds4sd
246
1
Udop Large 512 300k
MIT
UDOP is a universal document processing model that unifies vision, text, and layout, based on the T5 architecture, suitable for document AI tasks.
Image-to-Text Transformers
U
microsoft
264
32
Udop Large 512
MIT
UDOP is a universal document processing model that unifies vision, text, and layout, based on the T5 architecture, suitable for tasks such as document image classification, parsing, and visual question answering.
Image-to-Text Transformers
U
microsoft
193
5
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase